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ABSTRACT 



"Linear threshold element" is the generic term for a device which 
forms the sum aQ + + a2X2 + ... + a^x^ from an input vector (x^, X2> 

..., x^) and yields one of two outputs depending on whether or not the 
sum is positive. A pattern classification machine may utilize a linear 
threshold element along with a controller which receives the one of the 
two values corresponding to correct classification of the input vector. 

The purpose of the controller is to modify the gain vector (aQ, a^, •••> 
a^) so that the next input vector has a greater likelihood of being cor- 
rectly classified by the threshold element. 

This likelihood depends on the value of the gain vector and an 
adaptive algorithm of the "steepest descent" variety can be used to 
attempt to adjust the gain vector to its optimal value as the machine is 
exposed to a stationary sequence of statistically independent input vec- 
tors. The components of these vectors are commonly two valued, and it 
has been shown that convergence of the expected value of the gain vector 
is dependent on the value of the adjustment parameter, the values of the 
components, and the distribution of the input vectors. It is shown herein 
that a bound on the adjustment parameter, simply related to the values of 
the input components, is sufficient to insure this convergence. The var- 
iance of the gain vector is derived under the assumptions of a uniform 
input sequence and oppositely signed components of equal magnitude and 
it is shown that a similar bound on the adjustment parameter implies con- 
vergence of the variance. The variance is graphed under representative 
conditions. 
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1. Introduction. 



’’Linear threshold element” (LTE) is the generic term for a device 
which forms the sum aQ + + ^2^2 ^d^d input vector 

(x]^, X 2 > ••• j x^) and yields one of two outputs depending on whether 

or not the sum is positive. The components of the input vectors are com- 
monly two valued, and therefore the total number of possible input vec- 
tors is 2^. When used in a pattern classification machine (PCM) the 
output of the LTE classifies each input into one of two classifications. 
This classification may or may not be correct. The correct classifica- 
tion is given by the environment, a fixed but unknown function defined 
on the same set of input vectors, whose output is either of the two 
values of the LTE. 

As an example of an environment, consider a handwritten letter of 
the alphabet which is ’’read” by an array of mark sensing devices. The 
input to the environment is the pattern from the sensing devices and 
the output is whether or not the letter which was read is a particular 
letter. Consider also medical diagnosis. The electrocardiagraph of a 
particular human heart may be sensed as above and the output of the en- 
vironment is' whether or tx)t a particular anomaly is present. In either 
case if the same sensing pattern is presented to an LTE, it will also 
make a classification. The number of similar applications is large. 

In addition to the LTE, a PCM may use a controller. This device 
receives the environmental response to an input vector and attempts to 
modify the gain vector (aQ, a^, ^ 2 ^ ••• so that the next input 

vector has a greater likelihood of being correctly classified by the 
LTE. Several algorithms have been proposed to adjust the gain vector. [1] 
The method under consideration is the steepest descent algorithm of 
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Widrow and Hoff. [2,3] It will be discussed in section 2. 

The gain vector is given an initial setting a(l), and then the 
PCM is exposed to a potentially infinite sequence of input vectors 
{ X(n)3 and the corresponding sequence of correct output values from 
the environment {f[x(n)]}, n = 1,2,3,... . After each input vector is 
presented, the gain vector is adjusted by the controller to yield the 
sequence {a(n)} . Under certain conditions, this sequence will con- 
verge to a terminal vector a* which is optimal in some sense for the 
correct classification of an input vector by the LTE. 

It is assumed that the sequence of input vectors is a strictly 

stationary stochastic sequence of independent random variables, ie. 

X(n) = X, where X is a random variable whose statistical properties are 

completely described by the probability vector P = ( p 

L 

and Pj = Pr { X = X^}>0 for j = 1, 2, 3, ... , 2^ and = 1. A fre- 

quent example is the uniform input sequence: Pj = 2“^ for all j, which 
implies that the occurrence of each of the 2^ input vectors is equally 
likely. 

The LTE is used to predict the environment, and a measure of its 
ability to perform this task is the state of the PCM, S(a), defined as 
the expected value of the squared difference between the responses of 
the LTE and the environment with respect to the random variable X. The 
state of the PCM is zero if and only if the responses of the LTE and 
the environment are equal for all input vectors. The task , of the con- 
troller is to minimize S with respect to the gain vector. 

Due to the discontinuity of the step function, the minimization of 
S will not be without difficulty. Consider an auxilliary measure of the 
performance of the PCM, Q(a), the expected value of the squared differ- 
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ence between the response of the environment and the sum + 

... + a^x^ which is internally generated by the LTE. This auxilliary 
measure is the one chosen for minimization and it is believed that the 
following theorem is correct, although a satisfactory proof is not* 
known to exist. 

Theorem 1. If Q(a^) ^ Q(a) for all gain vectors a, 

then S(a*) ^ S(a) for all gain vectors a. 

To summarize the assumptions and notation, let the possible values 
of the components of the input vectors be q and r, then the LTE, g, is 
defined on the set 

{ xj : = ( xg, Xj, x^)^; Xq = max[|q|, | r| ] j 

^ j = Ij 2; t f 2 } , 

dr T » 

Let A = la: a = (ag, a^, ...,a^) ] ■!'* be the set of gain vectors. 

Then g(X) = sgn( a%) , where sgn(t) =/ 1, if t > 0 

^-1, otherwise, 

and R = { 1, -l) is the range set of g. The environment, f, also maps 

into R. Note that each environment could be interpreted as one of 
od 

the 2 Boolean functions of d binary variables. The measures are as 
follows, 

sCa) =[f(X) - g(X)]^ 

Q(a) =[ f(X) - a^x]^ . 

X is a random variable which takes on values in B^, all with a positive 
probability, in accordance with the probability vector P. Note that 
all functions of X are therefore random variables. 
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2. The Adaptive Algorithm 



The problem is to minimize 



Q(a) =[ f(X) - a’’^X]2 = f 2 (x) - 2^^Q^ f(X)x^ 



d d 
+ S E 
1=0 k=0 






Setting 



3Q(a) 
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0, V yields 






d 
S 
I =0 






X. 



L. 



( 1 ) 



That value of the gain vector which satisfies equation (1) is the 
vector a"**, which minimizes Q(a). But since f is unknown, the equation 
cannot be solved directly. 

Using the method of steepest descent, the gain vector is modified 
after each presentation of an input vector: 
a(n+l) = a(n) +T1 gradQ^, 



where 



a(n) is the value of the gain vector at the 



time of the nt'^ presentation; 

= { ^X(n)] - a'^(n)X(n)} 

X(n) is the input vector; 



9 Qn 9 Qn 



9Qn 



gradQ^ = - , -111 _;ii ); 

9 ^0 8 ai a 

T) is a positive constant. 

Each component is adjusted in the direction of decreasing 
Specifically, 

a^(n+l) = a^(n) + 2r|x^(n) {f[x(n)] - a"^(n)X(n) } , V i. (2) 
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It is to be noted that a(n) is a random variable. Now show that “a(h) 
converges to a"^. Martinez shows this as follows: [^] 

Rewrite equation (2) as 

a(n+l) = a(n) + 3[ a(n)] (3) 

where the adjustment parameter 3 = 2r|, = f[X(n)]X(n), 

and = X(n)X^(n). Note that = C^- 
Hence a(n+l) + [l -3C^] a(n) 

= dj^ + E^a(n), (4) 

where d^ = 0b^, and = I -3 C^. 

Expanding recursively and taking expected values, 



aTnT = d +E -d ^+E ,E ^d ,+ 
n-1 n-1 n-2 n-1 n-2 n-3 



+ E .E - •••Eia(l)« (5) 

n— i n-2 J- 

Due to the assumptions of independence and stationarity on the 
input sequence, equation (5) becomes 

a(n) = [l + E + E + ... + E ]d + E a(l) , 

where E = E_ and D = d , 
n n^ 

= [ I - E]“^ [l - e'^”^] D + E’^'^ad). (6) 

If i-im E*^ = 0, then 
n-“ 

-tim a(ri) =[ I - E] ^D. 
n-.® 

It is then shown that if C = is positive definite, and if 6 < 

^ , where X is the largest eigenvalue of C, then >tim E^ = 0. Hence, 

X 

if C is positive definite, and the positive constant Tiis less than 
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the reciprocal of the largest eigenvalue of C, then the modification 



given by equation (2) will cause a(n) to converge to a*, since 



[ I - e] = C ^b, where b = b^, and if a’ is the terminal value of 
a(n), then b = Ca*, which is the minimizing equation (1). 

Under what conditions is C positive definite? Consider 
and let Z be a non zero vector. Then z'^C’^Z = Z^'^X'^'^Z = (Z^X*^)^ ^ 0, 
which shows that is positive semi definite for all j. 






S P-C"^, and if Z is as above, then 

j = i ^ 

Z^CZ = E p .z'^C^Z s: 0 . 
j = 1 ' 



Therefore C is always positive semi definite. That it is, in fact, 
always positive definite is shown by contradiction. Assume 3 W ^ 
W^CW = 0. 

Then PjW^C^W = 0, for all j. Since pj > 0 for all j. 



W^C^W = 0 for all j, which implies that 
(W^xJ)2 = 0 for all j, and 

= 0 for all j. (7) 

Equation (7) is a linear system of 2^ equations, and if the vector W is 
a solution, then it must satisfy a subsystem of d+1 of the equations. 

r ^1 ^2 ^ d + 3 rt 

Letl.X ,X ,...,X "^bea set of linearly independent vectors from 
B*^. Such a set exists since spans d+1 space. Hence 



^ b, b, , 

(X ^X ^ X = 0. 



( 8 ) 



Now a nontrivial solution to exists if and only if 



b, h, b^ , 

I X ^ X X ( = 0. 
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But this set of vectors is linearly independent and their determinant 

T 

is non zero. Hence for all non zero vectors W, W CW 0, and C is 
always positive definite. 

What is the range of the eigenvalues of C? For each C^, diag(C^) 

(x^*)^), and it follows that diag(C) = ( e^, e., .. 
old u i 

e^) , where ^ 

2 j 2 

e. =S p.(x*^) for all i. 

^ j = 1 ^ " 

There is no loss of generality in assviming UNN , and in that case 
2 

eQ = r and 

q^< for i = 1, 2,...,d. 

Hence trace""* (6) = ^ e. < (d+l)r^. 

i = 0 ^ 

Let be the eigenvalues of C. Then, since trace (G) = 

^ ^ 0 for all i, 

i = 0 

^ 2 
X < Z X . < (d + l)r . 

i = 0 ^ 

^ 2 

It then follows that if the adjustment parameter ^ , then 

(d+l)r"^ 

a(n) will converge to a*, since C is always positve definite and 



2 2 

8 ^ r => 8 < . 

(d+l)r^ X 

The remainder of this section examines the convergence of a(n) under 
the following assumptions. 

(1) The components of the input vectors are of equal magnitude 
and oppositely signed, ie. 0< -q = r. 

-d 

(2) The input sequence is uniform, ie. p^ = 2 for all j. 
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2 

With these assumptions C will be shown to reduce to r I, which simpli- 
fies further analysis without sacrificing a great deal of applicability, 
since any PCM can be reworked to make the transformation of (1) above, 
and (2) above is a common ad hoc approach to a practical situation. 

Consider the following constructive scheme for the input vectors 
which is analogous to the binary representation of the integers. 



= ( r, -r, -r -r, -t ) 

= ( r, -r, -r, , -r, -r, r ) 

q ^ 

= ( r, -r, -r, ...o, -r, r, -r ) 
X = ( r, -r, -r, -r» r, r ) 



X — (r, r, r, ...td, r, r, r) 



This scheme could also be displayed as the matrix M = with each 



element of the form 



xJ 

1 



where (contrary to the usual convention) 



i is the column index - which refers to the components of a particular 
input vector - and j is the row index - which refers to a particular 
vector from The order of M is 2^(d +1). An analytic expression 



for m^j is 



r ^ -.N ^ ^d-1 

-1, if (j-l)mod2 < 2 



1, otherwise • 
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col 

0 

1 

1 

1 

1 

1 



col 

1 


col 

2 


col 

d-3 


col 

d-2 


col 

d«l 


col 

d 




-1 


-1 ... 


-1 


-1 


-1 




row 2* 


-1 


-1 ... 


-1 


-1 


-1 




row 2 


-1 


-1 ... 


-1 


-1 


1 




row 2 


-1 


-1 ... 


-1 


-1 


1 


1 

1 


i 

1 row 2 


«1 


-1 ... 


-1 


-1 


-1 


-1 


1 row 2 



M = 



1 .. 



1 

•1 




.d-2 

row 2 

row + 1 



od-1 

row 2 

row 2 +1 



row 2 



Now it is necessary to show that the colvmm vectors of M are 
mutually orthogonal. Using the method of induction of the dimension of 
the PCM, let d = 1. Then 

^ -1 



Ml = 



= (Nq N^) where Nq is the coliomn vectorj^Jand 






and is the colvnnn vector^ 






There is only one pair of vectors to 



check. ^0^1 " "i + i " 0. The vectors are orthogonal. Now let d = ^ 



13 



The 



and = ( CqC^C 2 ••• ) where is the i — column vector of 

induction assumption is that the column vectors of are mutually ortho- 
gonal* Then if is the i— column vector for the dimension >t+ 1, 



^9 1 “ (K-K K . ..K ) which can be partitioned 

'I'+l Olz i/+l 

with respect to rows as 

•** ^•t\ 

\ Cq ■” * 

With this partitioning any column vector of ^ can be expressed as a 
direct sum (a physical concatenation) of the column vectors of , to wit 

[5] Kq = Cq 4- Cq, = (-Co) + Cq, and = C^^_^ 4- C^_^ for i = 2, 3, 

T 

...,>i+l. Then any product of the form for i = 2, 3, . . . , 't +1 can 

be expressed as 



(Co t Co)T(Cj.j j C._j) = CoOj,j * OoC._j .0*0 = 0, 

with kJkj = ( Co t Co )’'<<-Co) i Co) • -cjcg * cjco = o. 

T 

Similarly, any product of the form for i = 2, 3,...,{/+l can be 

T * 

expressed as ((-Cq) + Cq ) ( C^_^ + 

T T T 

= -CnC- 1 + C^C. = 0, and any product of the form K.K. for 

0 l-i 0 1-1 ^ IT ^ j 

1 < i < j < 't +1 can be expressed as 

( Ci_i ^ c._p^( C..^ ; C._^ ) 



= 2C^-lCj-l 

= 0, since all C^’s are orthogonal. 
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Thus the column vectors of ^re mutually orthogonal, and by induction 

it is true that for all positive integral values of d, the columns of the 
matrix M are mutually orthogonal. 

d 

2 k 

Now consider the elements of C = E p, C idler e p. = 2 , and the 

k =1 ^ 

input components are 1 r. 



1-3 



-d ^ 

2 ? c - • 

e=l 1-3 



„-d^ , k kT 

2 ,S , ( X X ) . . 
k=l 13 



-d k k 
= 2 kSi *1*3 



n 2 X. 

-a - L 



= 2 r^S . m. m , since m. . 

k=l Ik jk’ ij 



This reduces to c. . = r'^6 . where 6 . . is the Kronecker delta, 
1-3 13’ 13 



since S m. m. = 0 if i ^ j, due to the orthogonality of the 

k=l 

2 

column vectors of the matrix M. Hence C = r I. 



Under these assumptions equation (6) becomes 



-1 n-1 n-1 

a(n) =[I-eJ [I-E ]D-t-E a(l) 

= [ec] I - ( I -ec)"’^]D + [I -BC3 "-^ad) 

= [8r^I ]~^ [ I - ( I - D -p r^I J ""^a(l) 

= [ 1 - ( 1 -erh^~^]8h + i I -S r^)'^’^a(l) 

= — [ 1 - ( 1 -Br^)^’^] Ca* - 1 - ( 1 -Br2)"‘^a(l) 

r 
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(9) 



= [1 - ( 1 a* + ( 1 -8r^)"'"a(l) 



2. n-1 



= a* - [ a* - a(l) ] ( 1 -8 ^ 



Equation (1) can be written as b = Ca*, which becomes b = 



the optimal value of the gain vector, a* = ^ 



Then 



Q(a*) = ^(X) - 2Y. a/f(X)x- + S E a/a* x x, 

1 . = 0 ^ ^ -t =0 k=0 ^ 



d d 



d 

= 1 - 2S a*r 
f =0 




d d 



+ E S 

t =0 k=0 



a,*a* 
-C k 






t k 



2 2 
= 1 - r (a*)^ 

From equation (11) it is evident that if Q(a^), the optimum 

2 -2 

minimization effort, is near zero, then (a*) is near r . Note 

2 -2 
range of (a*) , between zero and r 



( 10 ) 

a^, and 



( 11 ) 
of the 
also the 
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3. The Variance of the Gain Vector. 

I-' 

2 — — — 

Consider the variance of a(n), v[a(n)] = a"^(n) - a(n) . It is 

'T — 

necessary to compute a (n) , which will be done under assumption (1) , 
the components are ^ r. Prom equation (4), 



a(n+l) = d^ + E a(n), 
n n 

where ^ ^ X(n)] X(n), 

T 

~ ^ =ex(n)X (n), 

and let 0 < 0 < 1 . 

~ (d+Dr^ 



Then a^(n+l) = [ d^ + a^(n)E^][ d + E a(n)] 

n n ' n n 

* d^ + 2d^E^a(n) + a^(n)EnEj^a(n) , 

which can be written as 

a^(n+l) = 0^r^(d+l) + 2 [ 1- 6r^(d+l)] d\(n) 

n 

T 9 

a (n) f I + e[6r^(d+l) - 2 ] } a(n), (12) 

since d^ = 0 b^6 = 8 [ X(n)] X^(n)f[ X(n) J X(n) 

= 0^x'^(n)X(n) 

2 2 

= 8 r^(d+l), 

and dJ^E^ = d^ [ I -6 0^] = 0 f[ X (n) ] X^(n) [ I - 8 X(n)x”’^(n) J 

= 8 f[ X(n) ][x’’^(n) -6 r^(d+l)x'^(n) J 
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= [ 1 - 6 r^(d+l) ] d^, 

n 



and = I - 2BC^ 

= I - + 3^r^<d+l)C^. 

= I +e[ 8 r^Cd+l) - 2 ] 

Taking the expected value of equation (12), 
a^(n+l) = 8^r^(d+l) + 2[ 1 -gr^Cd+Dj d^a(n) 

+ a^(n) +e[6r^(d+l) - 2 ] a^(n)C^a(n). (13) 

T T 

Before considering the expected values of d^a(n) and a (n)C^a(n), 
recall the following theorem, as stated by Halmos. [ 6j 

If [ 2,*.., k; j = 1, 2,.*.,n^} is a set of inde- 

pendent functions, if is a real valued , Borel measurable function of 
n^ real variables, i = l, 2, ..., k, and if f^^(x) = cp j^(f (x) , .••, f^^^ 

i 

(x)), then the functions f^, ..., fj^ are independent- 

Now consider the set of independent random variables (X(l), X(2),..., 

X(n-l) , X(n)] . By definition d^ =6f[ X(n)J X(n) and is defined on the 

subset [X(n)} . On the other hand a(n) is defined on the complementary 

subset [ X(l) , ..., X(n-l)1 by the recursion relation of equation (2). 

Therefore, applying the theorem to each component of these vectors, the 

conclusion is that a(n) and d^^ are independent random variables. This 

\ T 

being the case, the expected value of d^a(n) is the product of the ex- 
pected values of d^ and a(n) . Hence, 

— — — — — = - tjx 

d^a(n) = dj^a(n) = 6 b a(n). Then using b = Ca* and equation 
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( 6 ), 



-1 



n-1 



,n-l 



dj!^a(n) =8 (a*)^C( (I-E) (I-E )D + E a) 



where a = a(l), 

= 6 (a*)^f [ I - ( I - 8C a* + (I - 8C)""^a 



= ft (a*)^Ca* - ft (a*)^C( I - 0 C )’^"^(a* - a ) 



Next consider the expected value of a'*’ (n)C^a (n) 



( 14 ) 



a'^(n)Cjja(n) = S aj^(n)S x£(n)x.(n)a. 

i=0 j=0 J ^ 



(n) 



d d 
= S 



c s ai(n)a .(n)x.(n)x.(n) , 

i=0 j=0 J ^ J 



(15) 



since a(n) and X(n) satisfy the hypothesis of the independence 
theorem above. 



2 

The goal now is to express this expected value as a^(n)G(C), where G 
is some unknown function of C. Equation (15) is close, but it contains 



terms in a£(n)aj(n) with i?^j, which have not yielded to analysis. Is it 
possible that these cross product terms vanish? Or is it possible that 
their coefficients Xj^Cn)Xj(n) vanish for i^j? The latter is, in fact, 
exactly the conclusion arrived at by assumption (2), the uniform input 
sequence. Under this assumption the derivation can proceed, and it will, 
justified by the urgent need for some results, however special. 

with C = r I used in the expected value terms, equation (5) becomes 



a^(n)Cjja(n) = S r^a?(n) = r^a^(n), 

i=0 
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and equation (14) becomes 



— T 9 9 

d^a(n) = g r^(a*)^ 



2pr^(a*)’^(a* - a)(l 



Finally returning to equation (13), 



a^Cn+l) =0 2r2(d+i) + 20r^ L 1 - 8 r^ (d+l)J (a*)^ 



-28r^[ 1 -Br^(d+l)J (a*)^(a* - a)( 1 -8 r^) 



2^n-l 



[( 1 -0r^)^ + 8^r^dja^(n) 



(16) 



The structure of equation (16) is more readily apparent if the 
following substitutions are made for the constants of the process. 

Let a =8r^[ 8(d+l) + 2[ 1 -P r^(d+l)J (a*)^} , 

Y = - 28 r^i! 1-8 r^(d+l)] (a*)^(a* - a) , 



6 = ( 1 -8 r^) 2 + B^r^d, and 



0 =(1-Br). 



Then 



n-1 



a'‘(n+l) = a + Y P “ * +6 a^(n) , 

which when reworked recursively one step becomes, 



i^(n+l) = a + Y D 6 [ o +Yp'^ ^ + 6 a (n-1) ] 



n-2 

= a(l+6)+Y(o+6)o +6a (n-1) , 



and one more step. 



(17) 



a^(n+l) = 0(1+6 +6^) + Y(d^+dS+ 6 ^)p + 6 \^(n-2) , 

and throiagh all the steps back to a(l), is 
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( 18 ) 



a^(n+l) = as ^ 6^ + YS p + 6 

i=0 i=0 

The geometric series may be put into closed form, yielding 

n-1 . 1 _ • 1 • rt n An 

i - 0 , « n-i-lci _ 0 - 0 



S 6 "- = !; ~ ° , and 2 d = 

1=0 1 ® 1=0 D -6 

Now replacing n by n- 1 , and making the above substitutions in equation 

( 18 ), 

n-1 



. II- X n-1 



1 «n-l 2 
■^ + 6 a . 



( 19 ) 



The denominators in equation ( 19 ) can be rewritten as follows. 

1 _ 6 = 8 r^[ 2 -B r 2 (d+l)] , and 

0-6 =6r^[l-B r^(d+l) ] . 

Now making all the substitutions fora , Y > andp , and cancelling the 
new forms of the denominators , 

n-1 



1-6 



a (n) = f 0 (d+1) + 2[ 1 -6 r (d+1)] (a*)^ H 5 

2 -0r^(d+l) 



1 



- 2(a*)^(a* - a) f (1 -Br^)*^ ^ +6''"^a^. 

Collecting all the terms in ^ and ( 1-6 r^)*^ ^ , 

~ 2 Zs B (d+1) + 2 [ 1 - ft r^(d+l) Jti a*) ^ 
a (n) = ** 



2 -p r^(d+l) 
2 



( 20 ) 



2(a* - a) + B (d+l)[ r (2(a*) - a^)a ~ d ■> n-1 

2 - 0 ’ 



- 2(a*)"^(a* - a)(l -Br^)""^. 



( 21 ) 



Equation ( 10 ) provides a(n) under assumptions ( 1 ) and ( 2 ) which 
when squared yields 
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2 



a(n) = (a*)^ - 2(a*)’^(a* - a)(l 

+ (a* - a)^[ (1 -B r2)2 ] ^-1 . (22) 

Combining equations (21) and (22), the variance of the gain vector is 

2 

V [ a(n) J = a^(n) - a(n) 

= g(d-t-l) -H 2[ 1 -e r^(d-i-l)J (a*)^ 

2 - 8r^(d+l) 

2(a* - a)^ + 3(d-t-l)L r2(2(a*)’^ - a’^)a - l] ^ ^ n-1 
2 - gr2(d+l) 

- (a*) 2 - (a* - a)^ ( 1 - g r2) . ( 23 ) 

Combining the constant terms leaves 

V[a(n)] = 1 - r^(a*)^J 

2 -g r2(d+l) 

2(a* - a)^ -Hg(d-H) L r^( 2(a*)’^ - a^)a - ij ^ ^ n-1 
2 - gr^(d+l) 

- (a* - a)2 L (1 -Brh^] 

where 5 a (1 - g r^)^ + g^r^d . (24) 

Now that the variance of the gain vector has been derived, the 

question is whether or not it converges, and if so, to xdiat, and under 

what conditions? The bounds on p which imply convergence of the mean 
2 

are zero and — j (under both assimipt ions) . Hence the bounds on 
r 

(1 - 0 r^) ^ are zero and one. Therefore limit [ (1 - r^) ^ ^ = 0. On 

n -*oo 

the other hand 6 is a quadratic express ion ing which is less than one for 
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3 between zero and 



2 



and therefore this lesser bound must be 



(d+Dr^ 

observed in order for the term in 6 to vanish, and consequently insure 
the convergence of the variance of the gain vector. Hence, if both as- 
sumptions (1) and (2) are valid, and if 0< 3 < - , then both 

(d+l)r2 



a(n) and v[a(n)] will converge. 

The mean will converge to a*, that a which minimizes Q(a), and the var- 
iance to 



V = Imit V[a(n)] = 9 (d*l) t 1 - _ 

n-^ 2-6 r^(d+l) 



(25) 



Now recall the relationship from equation (11) between a* and Q(a*) 
which enables equation (25) to be written as 



V = S(d+l)Q(a^) — ^ where the bounds on Q are zero and one 

2 -3 r2(d+l) 

and can be interpreted as a measure of the complexity of the environment 
to which the PCM is exposed. Since the derivative of V with respect to 

3 is 



^ , which is non negative, and since V vanishes 

[2 -Sr2(a.l)J^ 

for 3 = 0, the smaller 3 , the smaller V. 

Exactly how small V must be in order for the state of the PCM to be 
minimized is an open question. Intuitively, it is felt that S(a) reaches 
its minimum before (in the input sequence) Q(a) is minimized, and the 
answer is probably also the answer to Theorem 1. Further work on this 
problem is indicated. Also, of course, is the need for generalizing this 
work to apply to any input sequence. Once these details are in order, it 
should be possible to obtain an analytic expression for the number (or 
average number) of trials to achieve a minimum for the state of the PCM. 
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other algorithms and configurations which may yield to analysis can be 
found in Nilsson. [ ij 

In the appendix are the results of evaluating the variance of the 
gain vector for some representative values of the parameters. 
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APPENDIX I 



Assumptions: (1) The input components are 1 !• 

(2) The input sequence is uniform. 

(3) 0 < e < -L. 

d+l 



e(d+l)Q(a*) 

^ - - T-VCdTi) 



Let B = 



d+l 



, then 0 < Z< 2 and V = where Q = Q(a*) 

2-Z 



= Q[ 



2-Z 



- 1 ] 



Note that 0 < Q < 1 and observe Fig. 1. Figures 2 and 3 are 
graphs of the variance when Q = 0 and therefore V = 0 for all values of 

Z. 

Figures 4 and 5 are graphs of the variance when Q = ^ V depends 

on Z. 
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O Q = 1. 00 
O Q= .75 
A Q = . 50 



Figure 1. V = ^ ^ 
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Figure 
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